Nepali Speech Recognition using RNN-CTC Model
نویسندگان
چکیده
منابع مشابه
HPC Project: CTC loss for RNN speech recognition
One of the major challenges in speech recogntion or any other field, that concerns itself with structured predictions, is the alginment of two different sequences. The training data for training an RNN is a set of utterances, consisting of audio recorded via a regular microphone and the transcription of the spoken words. This transcription may either be in terms of phonemes or characters. It is...
متن کاملLanguage Adaptive Multilingual CTC Speech Recognition
Recently, it has been demonstrated that speech recognition systems are able to achieve human parity. While much research is done for resource-rich languages like English, there exists a long tail of languages for which no speech recognition systems do yet exist. The major obstacle in building systems for new languages is the lack of available resources. In the past, several methods have been pr...
متن کاملAdvances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-base...
متن کاملHandwritten Digit String Recognition by Combination of Residual Network and RNN-CTC
Recurrent neural network (RNN) and connectionist temporal classification (CTC) have showed successes in many sequence labeling tasks with the strong ability of dealing with the problems where the alignment between the inputs and the target labels is unknown. Residual network is a new structure of convolutional neural network and works well in various computer vision tasks. In this paper, we tak...
متن کاملMultitask Learning with CTC and Segmental CRF for Speech Recognition
Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models. Both models define a transcription probability by marginalizing decisions about latent segmentation alternatives to derive a sequence probability: the former uses a globally normalized joint model of segment labe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Applications
سال: 2019
ISSN: 0975-8887
DOI: 10.5120/ijca2019918401